STA6235: Modeling in Regression
Consider the gamma distribution, f(y|\mu, \gamma) = \frac{1}{\Gamma(\gamma) \left( \frac{\mu}{\gamma} \right)^\gamma} y^{\gamma-1} \exp\left\{ \frac{-y \gamma}{\mu} \right\}
where: y > 0, \mu > 0, \gamma > 0, and \Gamma(\cdot) is the Gamma function
This is appropriate for continuous, positive data that has a right skew.
I have primarily used it for complete time-to-event data
The canonical link is the negative inverse…
glm() function to perform Gamma regression,link = "log" attached to family.
Call:
glm(formula = HbA1c ~ age + as.factor(BMI3cat), family = Gamma(link = "log"),
data = data)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.6245542 0.0168291 96.532 < 2e-16 ***
age 0.0033199 0.0003007 11.042 < 2e-16 ***
as.factor(BMI3cat)1 -0.0635853 0.0078001 -8.152 5.55e-16 ***
as.factor(BMI3cat)2 -0.1101423 0.0108309 -10.169 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for Gamma family taken to be 0.03165908)
Null deviance: 74.373 on 2563 degrees of freedom
Residual deviance: 66.484 on 2560 degrees of freedom
AIC: 6939
Number of Fisher Scoring iterations: 4
\ln(y) = 1.625 + 0.003 \text{ age} - 0.064 \text{ BMI}_1 - 0.110 \text{ BMI}_2
Uh oh. We are now modeling ln(y) and not y directly…
We will transform the coefficients:
\begin{align*} \ln(y) &= \hat{\beta}_0 + \hat{\beta}_1 x_1 + \hat{\beta}_2 x_2 + ... \hat{\beta}_k x_k \\ y &= \exp\left\{\hat{\beta}_0 + \hat{\beta}_1 x_1 + \hat{\beta}_2 x_2 + ... \hat{\beta}_kx_k \right\} \\ y &= e^{\hat{\beta}_0} e^{\hat{\beta}_1x_1} e^{\hat{\beta}_2 x_2} \cdot \cdot \cdot e^{\hat{\beta}_k x_k} \end{align*}
\ln(y) = 1.625 + 0.003 \text{ age} - 0.064 \text{ BMI}_1 - 0.110 \text{ BMI}_2
For a 1 year increase in age, the expected HbA1c is multiplied by e^{0.003}=1.003. This is a 0.3% increase.
For a 10 year increase in age, the expected HbA1c is multiplied by e^{0.003 \times 10}=1.030. This is a 3% increase.
What we’ve learned so far re: significance of predictors holds true with GzLM
Significance of individual (continuous or binary) predictors \to t-test
Significance of categorical (>2 categories) predictors \to ANOVA with full/reduced models
test = "LRT" to the anova() function.
Call:
glm(formula = HbA1c ~ age + as.factor(BMI3cat) + age:as.factor(BMI3cat),
family = Gamma(link = "log"), data = data)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.6224094 0.0229685 70.636 < 2e-16 ***
age 0.0033598 0.0004186 8.027 1.51e-15 ***
as.factor(BMI3cat)1 -0.0737300 0.0376780 -1.957 0.0505 .
as.factor(BMI3cat)2 -0.0769523 0.0472673 -1.628 0.1036
age:as.factor(BMI3cat)1 0.0001814 0.0006727 0.270 0.7874
age:as.factor(BMI3cat)2 -0.0006265 0.0008658 -0.724 0.4694
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for Gamma family taken to be 0.03167834)
Null deviance: 74.373 on 2563 degrees of freedom
Residual deviance: 66.460 on 2558 degrees of freedom
AIC: 6942.1
Number of Fisher Scoring iterations: 4
Call:
glm(formula = HbA1c ~ age + as.factor(BMI3cat), family = Gamma(link = "log"),
data = data)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.6245542 0.0168291 96.532 < 2e-16 ***
age 0.0033199 0.0003007 11.042 < 2e-16 ***
as.factor(BMI3cat)1 -0.0635853 0.0078001 -8.152 5.55e-16 ***
as.factor(BMI3cat)2 -0.1101423 0.0108309 -10.169 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for Gamma family taken to be 0.03165908)
Null deviance: 74.373 on 2563 degrees of freedom
Residual deviance: 66.484 on 2560 degrees of freedom
AIC: 6939
Number of Fisher Scoring iterations: 4
Age is a significant predictor of HbA1c (p < 0.001).
We need a partial F to determine if health status as defined by BMI is a significant predictor of HbA1c.